The present invention generally relates to data de-identification.
De-identification of personal information is a challenge in any domain in which the use of individual identifiable information has privacy implications. De-identification of such data can be desirable, but can be complex task. Various existing scripts which claim to perform “de-identification” merely scramble data, e.g. by changing “John Smith” and “Jane Doe” to “John Doe” and “Jane Smith”. Likewise, in some existing scripts, other fields may be jumbled or including data that isn't semantically accurate. In the healthcare context, such a script not only does not meet HIPAA requirements, but is not ‘useful’ data for support and testing purposes, as the de-identified data doesn't resemble actual data.
One or more needs exist for improvement in data de-identification. These, and other needs, are addressed by one or more aspects of the present invention.